Overview

Dataset statistics

Number of variables15
Number of observations645816
Missing cells0
Missing cells (%)0.0%
Duplicate rows90542
Duplicate rows (%)14.0%
Total size in memory73.9 MiB
Average record size in memory120.0 B

Variable types

Numeric9
Categorical6

Alerts

year has constant value "2019" Constant
month has constant value "11" Constant
Weekend has constant value "0" Constant
Dataset has 90542 (14.0%) duplicate rowsDuplicates
NumOfEventsInJourney is highly correlated with NumSessions and 2 other fieldsHigh correlation
NumSessions is highly correlated with NumOfEventsInJourney and 3 other fieldsHigh correlation
interactionTime is highly correlated with NumSessionsHigh correlation
maxPrice is highly correlated with minPriceHigh correlation
minPrice is highly correlated with maxPriceHigh correlation
NumCart is highly correlated with NumOfEventsInJourney and 3 other fieldsHigh correlation
NumView is highly correlated with NumOfEventsInJourney and 4 other fieldsHigh correlation
InsessionCart is highly correlated with NumCart and 1 other fieldsHigh correlation
year is highly correlated with Weekend and 4 other fieldsHigh correlation
Weekend is highly correlated with year and 4 other fieldsHigh correlation
Purchase is highly correlated with NumViewHigh correlation
timeOfDay is highly correlated with year and 2 other fieldsHigh correlation
weekday is highly correlated with year and 2 other fieldsHigh correlation
month is highly correlated with year and 4 other fieldsHigh correlation
InsessionView is highly skewed (γ1 = 28.41525841) Skewed
interactionTime has 626461 (97.0%) zeros Zeros
NumCart has 616828 (95.5%) zeros Zeros
NumView has 33310 (5.2%) zeros Zeros
InsessionCart has 613920 (95.1%) zeros Zeros
InsessionView has 30664 (4.7%) zeros Zeros

Reproduction

Analysis started2022-09-27 02:06:55.602402
Analysis finished2022-09-27 02:07:16.497782
Duration20.9 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

NumOfEventsInJourney
Real number (ℝ≥0)

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.033233924
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:16.534400image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum12
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2018230987
Coefficient of variation (CV)0.1953314675
Kurtosis117.1425717
Mean1.033233924
Median Absolute Deviation (MAD)0
Skewness8.150669616
Sum667279
Variance0.04073256317
MonotonicityNot monotonic
2022-09-26T22:07:16.580846image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1626424
97.0%
217733
 
2.7%
31394
 
0.2%
4187
 
< 0.1%
548
 
< 0.1%
613
 
< 0.1%
76
 
< 0.1%
95
 
< 0.1%
84
 
< 0.1%
101
 
< 0.1%
ValueCountFrequency (%)
1626424
97.0%
217733
 
2.7%
31394
 
0.2%
4187
 
< 0.1%
548
 
< 0.1%
613
 
< 0.1%
76
 
< 0.1%
84
 
< 0.1%
95
 
< 0.1%
101
 
< 0.1%
ValueCountFrequency (%)
121
 
< 0.1%
101
 
< 0.1%
95
 
< 0.1%
84
 
< 0.1%
76
 
< 0.1%
613
 
< 0.1%
548
 
< 0.1%
4187
 
< 0.1%
31394
 
0.2%
217733
2.7%

NumSessions
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.019697561
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:16.628775image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile1
Maximum11
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1562473221
Coefficient of variation (CV)0.1532290829
Kurtosis213.1474231
Mean1.019697561
Median Absolute Deviation (MAD)0
Skewness10.92140517
Sum658537
Variance0.02441322566
MonotonicityNot monotonic
2022-09-26T22:07:16.674078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1634276
98.2%
210620
 
1.6%
3760
 
0.1%
4110
 
< 0.1%
527
 
< 0.1%
611
 
< 0.1%
84
 
< 0.1%
94
 
< 0.1%
73
 
< 0.1%
111
 
< 0.1%
ValueCountFrequency (%)
1634276
98.2%
210620
 
1.6%
3760
 
0.1%
4110
 
< 0.1%
527
 
< 0.1%
611
 
< 0.1%
73
 
< 0.1%
84
 
< 0.1%
94
 
< 0.1%
111
 
< 0.1%
ValueCountFrequency (%)
111
 
< 0.1%
94
 
< 0.1%
84
 
< 0.1%
73
 
< 0.1%
611
 
< 0.1%
527
 
< 0.1%
4110
 
< 0.1%
3760
 
0.1%
210620
 
1.6%
1634276
98.2%

interactionTime
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct11987
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5817.817109
Minimum0
Maximum2526451
Zeros626461
Zeros (%)97.0%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:16.732044image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum2526451
Range2526451
Interquartile range (IQR)0

Descriptive statistics

Standard deviation71170.31405
Coefficient of variation (CV)12.23316456
Kurtosis353.2111249
Mean5817.817109
Median Absolute Deviation (MAD)0
Skewness17.12366893
Sum3757239374
Variance5065213601
MonotonicityNot monotonic
2022-09-26T22:07:16.799975image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0626461
97.0%
1494
 
< 0.1%
1683
 
< 0.1%
2383
 
< 0.1%
2281
 
< 0.1%
1980
 
< 0.1%
1176
 
< 0.1%
1775
 
< 0.1%
875
 
< 0.1%
1274
 
< 0.1%
Other values (11977)18634
 
2.9%
ValueCountFrequency (%)
0626461
97.0%
137
 
< 0.1%
235
 
< 0.1%
345
 
< 0.1%
463
 
< 0.1%
558
 
< 0.1%
645
 
< 0.1%
773
 
< 0.1%
875
 
< 0.1%
966
 
< 0.1%
ValueCountFrequency (%)
25264511
< 0.1%
25204401
< 0.1%
25087631
< 0.1%
24943771
< 0.1%
24784031
< 0.1%
24733211
< 0.1%
24429331
< 0.1%
24369521
< 0.1%
24339761
< 0.1%
24191621
< 0.1%

maxPrice
Real number (ℝ≥0)

HIGH CORRELATION

Distinct30870
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean291.4637556
Minimum0
Maximum2574.07
Zeros1799
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:16.865504image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19.18
Q168.47
median164.48
Q3360.34
95-th percentile991.7825
Maximum2574.07
Range2574.07
Interquartile range (IQR)291.87

Descriptive statistics

Standard deviation356.3124965
Coefficient of variation (CV)1.222493328
Kurtosis8.567720794
Mean291.4637556
Median Absolute Deviation (MAD)116.09
Skewness2.575182739
Sum188231956.8
Variance126958.5952
MonotonicityNot monotonic
2022-09-26T22:07:16.930551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
154.422633
 
0.4%
89.842470
 
0.4%
334.372086
 
0.3%
231.641887
 
0.3%
01799
 
0.3%
308.631787
 
0.3%
643.231776
 
0.3%
308.861635
 
0.3%
51.461594
 
0.2%
82.631561
 
0.2%
Other values (30860)626588
97.0%
ValueCountFrequency (%)
01799
0.3%
0.773
 
< 0.1%
0.793
 
< 0.1%
0.82
 
< 0.1%
0.813
 
< 0.1%
0.836
 
< 0.1%
0.855
 
< 0.1%
0.876
 
< 0.1%
0.8820
 
< 0.1%
0.914
 
< 0.1%
ValueCountFrequency (%)
2574.0773
< 0.1%
2574.04114
< 0.1%
2573.9912
 
< 0.1%
2573.8151
< 0.1%
2573.79121
< 0.1%
2573.761
 
< 0.1%
2573.455
 
< 0.1%
2573.292
 
< 0.1%
2573.171
 
< 0.1%
2572.23104
< 0.1%

minPrice
Real number (ℝ≥0)

HIGH CORRELATION

Distinct30856
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean291.3639305
Minimum0
Maximum2574.07
Zeros1813
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:17.109787image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19.13
Q168.47
median164.37
Q3360.34
95-th percentile991.13
Maximum2574.07
Range2574.07
Interquartile range (IQR)291.87

Descriptive statistics

Standard deviation356.2108656
Coefficient of variation (CV)1.222563359
Kurtosis8.574145299
Mean291.3639305
Median Absolute Deviation (MAD)115.98
Skewness2.575915721
Sum188167488.2
Variance126886.1808
MonotonicityNot monotonic
2022-09-26T22:07:17.174986image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
154.422644
 
0.4%
89.842467
 
0.4%
334.372084
 
0.3%
231.641883
 
0.3%
01813
 
0.3%
308.631776
 
0.3%
643.231772
 
0.3%
308.861625
 
0.3%
51.461596
 
0.2%
82.631561
 
0.2%
Other values (30846)626595
97.0%
ValueCountFrequency (%)
01813
0.3%
0.773
 
< 0.1%
0.793
 
< 0.1%
0.82
 
< 0.1%
0.813
 
< 0.1%
0.836
 
< 0.1%
0.855
 
< 0.1%
0.876
 
< 0.1%
0.8820
 
< 0.1%
0.914
 
< 0.1%
ValueCountFrequency (%)
2574.0773
< 0.1%
2574.04113
< 0.1%
2573.9912
 
< 0.1%
2573.8151
< 0.1%
2573.79121
< 0.1%
2573.761
 
< 0.1%
2573.455
 
< 0.1%
2573.292
 
< 0.1%
2573.171
 
< 0.1%
2572.23104
< 0.1%

NumCart
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04623607963
Minimum0
Maximum9
Zeros616828
Zeros (%)95.5%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:17.228778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2173800252
Coefficient of variation (CV)4.701523721
Kurtosis32.6556108
Mean0.04623607963
Median Absolute Deviation (MAD)0
Skewness5.00299969
Sum29860
Variance0.04725407533
MonotonicityNot monotonic
2022-09-26T22:07:17.270471image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0616828
95.5%
128210
 
4.4%
2710
 
0.1%
355
 
< 0.1%
47
 
< 0.1%
53
 
< 0.1%
91
 
< 0.1%
61
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
0616828
95.5%
128210
 
4.4%
2710
 
0.1%
355
 
< 0.1%
47
 
< 0.1%
53
 
< 0.1%
61
 
< 0.1%
71
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
71
 
< 0.1%
61
 
< 0.1%
53
 
< 0.1%
47
 
< 0.1%
355
 
< 0.1%
2710
 
0.1%
128210
 
4.4%
0616828
95.5%

NumView
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9729412093
Minimum0
Maximum9
Zeros33310
Zeros (%)5.2%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:17.314793image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile1
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2832555948
Coefficient of variation (CV)0.2911333101
Kurtosis21.43012545
Mean0.9729412093
Median Absolute Deviation (MAD)0
Skewness-0.0201645368
Sum628341
Variance0.080233732
MonotonicityNot monotonic
2022-09-26T22:07:17.355874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1597891
92.6%
033310
 
5.2%
213613
 
2.1%
3857
 
0.1%
4106
 
< 0.1%
521
 
< 0.1%
610
 
< 0.1%
73
 
< 0.1%
93
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
033310
 
5.2%
1597891
92.6%
213613
 
2.1%
3857
 
0.1%
4106
 
< 0.1%
521
 
< 0.1%
610
 
< 0.1%
73
 
< 0.1%
82
 
< 0.1%
93
 
< 0.1%
ValueCountFrequency (%)
93
 
< 0.1%
82
 
< 0.1%
73
 
< 0.1%
610
 
< 0.1%
521
 
< 0.1%
4106
 
< 0.1%
3857
 
0.1%
213613
 
2.1%
1597891
92.6%
033310
 
5.2%

InsessionCart
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0508116863
Minimum0
Maximum8
Zeros613920
Zeros (%)95.1%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:17.399233image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2268888315
Coefficient of variation (CV)4.465288362
Kurtosis26.38035737
Mean0.0508116863
Median Absolute Deviation (MAD)0
Skewness4.664764099
Sum32815
Variance0.05147854186
MonotonicityNot monotonic
2022-09-26T22:07:17.442921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0613920
95.1%
131065
 
4.8%
2766
 
0.1%
352
 
< 0.1%
48
 
< 0.1%
53
 
< 0.1%
81
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
0613920
95.1%
131065
 
4.8%
2766
 
0.1%
352
 
< 0.1%
48
 
< 0.1%
53
 
< 0.1%
71
 
< 0.1%
81
 
< 0.1%
ValueCountFrequency (%)
81
 
< 0.1%
71
 
< 0.1%
53
 
< 0.1%
48
 
< 0.1%
352
 
< 0.1%
2766
 
0.1%
131065
 
4.8%
0613920
95.1%

InsessionView
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.198116801
Minimum0
Maximum68
Zeros30664
Zeros (%)4.7%
Negative0
Negative (%)0.0%
Memory size4.9 MiB
2022-09-26T22:07:17.492919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum68
Range68
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8756566779
Coefficient of variation (CV)0.7308608619
Kurtosis1911.400458
Mean1.198116801
Median Absolute Deviation (MAD)0
Skewness28.41525841
Sum773763
Variance0.7667746176
MonotonicityNot monotonic
2022-09-26T22:07:17.541113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1500860
77.6%
287158
 
13.5%
030664
 
4.7%
318927
 
2.9%
45300
 
0.8%
51662
 
0.3%
6686
 
0.1%
7233
 
< 0.1%
878
 
< 0.1%
966
 
< 0.1%
Other values (11)182
 
< 0.1%
ValueCountFrequency (%)
030664
 
4.7%
1500860
77.6%
287158
 
13.5%
318927
 
2.9%
45300
 
0.8%
51662
 
0.3%
6686
 
0.1%
7233
 
< 0.1%
878
 
< 0.1%
966
 
< 0.1%
ValueCountFrequency (%)
6834
< 0.1%
3417
< 0.1%
3221
< 0.1%
2312
 
< 0.1%
2214
< 0.1%
2010
 
< 0.1%
1921
< 0.1%
135
 
< 0.1%
1217
< 0.1%
117
 
< 0.1%

year
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
2019
645816 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2583264
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2019
3rd row2019
4th row2019
5th row2019

Common Values

ValueCountFrequency (%)
2019645816
100.0%

Length

2022-09-26T22:07:17.593123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:17.639162image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2019645816
100.0%

Most occurring characters

ValueCountFrequency (%)
2645816
25.0%
0645816
25.0%
1645816
25.0%
9645816
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2583264
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2645816
25.0%
0645816
25.0%
1645816
25.0%
9645816
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common2583264
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2645816
25.0%
0645816
25.0%
1645816
25.0%
9645816
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2583264
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2645816
25.0%
0645816
25.0%
1645816
25.0%
9645816
25.0%

month
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
11
645816 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1291632
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11
2nd row11
3rd row11
4th row11
5th row11

Common Values

ValueCountFrequency (%)
11645816
100.0%

Length

2022-09-26T22:07:17.677204image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:17.723152image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
11645816
100.0%

Most occurring characters

ValueCountFrequency (%)
11291632
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1291632
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11291632
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1291632
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
11291632
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1291632
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11291632
100.0%

weekday
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
Sat
126504 
Fr
123677 
Sun
110305 
Thu
78638 
Mon
71479 
Other values (2)
135213 

Length

Max length3
Median length3
Mean length2.808494989
Min length2

Characters and Unicode

Total characters1813771
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSat
2nd rowFr
3rd rowThu
4th rowMon
5th rowTue

Common Values

ValueCountFrequency (%)
Sat126504
19.6%
Fr123677
19.2%
Sun110305
17.1%
Thu78638
12.2%
Mon71479
11.1%
Tue67850
10.5%
Wed67363
10.4%

Length

2022-09-26T22:07:17.761630image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:17.818525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
sat126504
19.6%
fr123677
19.2%
sun110305
17.1%
thu78638
12.2%
mon71479
11.1%
tue67850
10.5%
wed67363
10.4%

Most occurring characters

ValueCountFrequency (%)
u256793
14.2%
S236809
13.1%
n181784
10.0%
T146488
8.1%
e135213
7.5%
a126504
7.0%
t126504
7.0%
F123677
6.8%
r123677
6.8%
h78638
 
4.3%
Other values (4)277684
15.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1167955
64.4%
Uppercase Letter645816
35.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u256793
22.0%
n181784
15.6%
e135213
11.6%
a126504
10.8%
t126504
10.8%
r123677
10.6%
h78638
 
6.7%
o71479
 
6.1%
d67363
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
S236809
36.7%
T146488
22.7%
F123677
19.2%
M71479
 
11.1%
W67363
 
10.4%

Most occurring scripts

ValueCountFrequency (%)
Latin1813771
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
u256793
14.2%
S236809
13.1%
n181784
10.0%
T146488
8.1%
e135213
7.5%
a126504
7.0%
t126504
7.0%
F123677
6.8%
r123677
6.8%
h78638
 
4.3%
Other values (4)277684
15.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1813771
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
u256793
14.2%
S236809
13.1%
n181784
10.0%
T146488
8.1%
e135213
7.5%
a126504
7.0%
t126504
7.0%
F123677
6.8%
r123677
6.8%
h78638
 
4.3%
Other values (4)277684
15.3%

timeOfDay
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
Afternoon
165214 
EarlyMorning
136248 
Evening
110578 
Morning
103407 
Dawn
80495 
Other values (2)
49874 

Length

Max length12
Median length9
Mean length7.985099471
Min length4

Characters and Unicode

Total characters5156905
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMorning
2nd rowDawn
3rd rowAfternoon
4th rowNoon
5th rowNoon

Common Values

ValueCountFrequency (%)
Afternoon165214
25.6%
EarlyMorning136248
21.1%
Evening110578
17.1%
Morning103407
16.0%
Dawn80495
12.5%
Noon34242
 
5.3%
Night15632
 
2.4%

Length

2022-09-26T22:07:17.901117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:17.962628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
afternoon165214
25.6%
earlymorning136248
21.1%
evening110578
17.1%
morning103407
16.0%
dawn80495
12.5%
noon34242
 
5.3%
night15632
 
2.4%

Most occurring characters

ValueCountFrequency (%)
n1145631
22.2%
o638567
12.4%
r541117
10.5%
i365865
 
7.1%
g365865
 
7.1%
e275792
 
5.3%
E246826
 
4.8%
M239655
 
4.6%
a216743
 
4.2%
t180846
 
3.5%
Other values (9)939998
18.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4374841
84.8%
Uppercase Letter782064
 
15.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n1145631
26.2%
o638567
14.6%
r541117
12.4%
i365865
 
8.4%
g365865
 
8.4%
e275792
 
6.3%
a216743
 
5.0%
t180846
 
4.1%
f165214
 
3.8%
l136248
 
3.1%
Other values (4)342953
 
7.8%
Uppercase Letter
ValueCountFrequency (%)
E246826
31.6%
M239655
30.6%
A165214
21.1%
D80495
 
10.3%
N49874
 
6.4%

Most occurring scripts

ValueCountFrequency (%)
Latin5156905
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n1145631
22.2%
o638567
12.4%
r541117
10.5%
i365865
 
7.1%
g365865
 
7.1%
e275792
 
5.3%
E246826
 
4.8%
M239655
 
4.6%
a216743
 
4.2%
t180846
 
3.5%
Other values (9)939998
18.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII5156905
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n1145631
22.2%
o638567
12.4%
r541117
10.5%
i365865
 
7.1%
g365865
 
7.1%
e275792
 
5.3%
E246826
 
4.8%
M239655
 
4.6%
a216743
 
4.2%
t180846
 
3.5%
Other values (9)939998
18.2%

Weekend
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
0
645816 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters645816
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0645816
100.0%

Length

2022-09-26T22:07:18.016448image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:18.062404image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0645816
100.0%

Most occurring characters

ValueCountFrequency (%)
0645816
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number645816
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0645816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common645816
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0645816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII645816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0645816
100.0%

Purchase
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.9 MiB
0
636839 
1
 
8977

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters645816
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Length

2022-09-26T22:07:18.102057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-26T22:07:18.151092image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Most occurring characters

ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number645816
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common645816
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII645816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0636839
98.6%
18977
 
1.4%

Interactions

2022-09-26T22:07:14.381294image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:06.540497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.430795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.392381image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.401690image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.498362image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.478799image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.402961image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.447135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.483902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:06.640337image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.533191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.499645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.518742image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.604951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.583795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.507647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.549773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.586835image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:06.735513image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.667910image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.608216image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.741497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.708836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.686041image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.611111image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.654268image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.692824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:06.834240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.771408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.722465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.850408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.819443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.789440image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.714971image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.758385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.795377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:06.931289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.873331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.835464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.956435image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.927373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.892009image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.819085image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.863450image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.897162image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.030010image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.975290image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.948541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.062567image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.032825image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.992917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.923636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.967426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:15.000806image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.126838image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.075436image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.056410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.168269image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.143205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.092158image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.024871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.069751image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:15.103296image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.222313image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.177092image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.167103image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.272447image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.253408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.191776image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.119958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.170396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:15.213738image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:07.330189image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:08.284099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:09.282602image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:10.388034image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:11.370852image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:12.295525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:13.222919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-26T22:07:14.276484image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-26T22:07:18.194496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-26T22:07:18.299872image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-26T22:07:18.386229image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-26T22:07:18.466068image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-26T22:07:18.526719image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-26T22:07:15.396409image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-26T22:07:15.817254image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

NumOfEventsInJourneyNumSessionsinteractionTimemaxPriceminPriceNumCartNumViewInsessionCartInsessionViewyearmonthweekdaytimeOfDayWeekendPurchase
0110.0154.41154.410101201911SatMorning00
1110.092.6792.670101201911FrDawn00
2110.0155.71155.710101201911ThuAfternoon00
3110.0898.32898.320101201911MonNoon00
4110.0146.21146.210101201911TueNoon00
5110.0244.54244.540101201911SatNoon00
6110.0234.24234.240101201911SatMorning00
7110.0463.31463.310101201911WedEvening00
8110.0253.25253.250101201911ThuAfternoon00
9110.0450.18450.180101201911MonMorning00

Last rows

NumOfEventsInJourneyNumSessionsinteractionTimemaxPriceminPriceNumCartNumViewInsessionCartInsessionViewyearmonthweekdaytimeOfDayWeekendPurchase
645806110.02239.182239.180101201911MonEarlyMorning00
645807110.0121.24121.240101201911SatDawn00
645808110.0615.20615.200101201911SatEvening00
645809110.02254.822254.820102201911SatMorning00
645810110.0991.06991.060101201911TueDawn00
645811110.0128.67128.670102201911SatEarlyMorning00
645812110.0244.51244.510101201911ThuEarlyMorning00
645813110.0152.82152.820101201911SunEarlyMorning00
645814110.0190.22190.220101201911WedEvening00
645815110.024.4124.410101201911WedAfternoon00

Duplicate rows

Most frequently occurring

NumOfEventsInJourneyNumSessionsinteractionTimemaxPriceminPriceNumCartNumViewInsessionCartInsessionViewyearmonthweekdaytimeOfDayWeekendPurchase# duplicates
42570110.0154.42154.420101201911FrAfternoon00138
83967110.0914.00914.000101201911SatEarlyMorning00123
56953110.0243.49243.490101201911SatEarlyMorning00122
36962110.0128.42128.420101201911SunEarlyMorning00119
36960110.0128.42128.420101201911SunAfternoon00113
42586110.0154.42154.420101201911SatEarlyMorning00111
83960110.0914.00914.000101201911FrDawn00110
42593110.0154.42154.420101201911SunEarlyMorning00107
36953110.0128.42128.420101201911SatAfternoon00106
27274110.089.8489.840101201911SatAfternoon00104